Word predictability after hesitations: a corpus-based study

نویسندگان

  • Elizabeth Shriberg
  • Andreas Stolcke
چکیده

We ask whether lexical hesitations in spontaneous speech tend to precede words that are difficult to predict. We define predictability in terms of both transition probability and entropy, in the context of an N-gram language model. Results show that transition probability is significantly lower at hesitation transitions, and that this is attributable to both the following word and the word history. In addition, results suggest that fluent transitions in sentences with a hesitation elsewhere are significantly more likely than transitions in fluent sentences to contain out-of-vocabulary words and novel word combinations. Such findings could be used to improve statistical language modeling for spontaneous-speech applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a Corpus-Based Word List in Pharmacy Research ‎Articles: A Focus on Academic Culture

The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...

متن کامل

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

The Provo Corpus: A large eye-tracking corpus with predictability norms.

This article presents the Provo Corpus, a corpus of eye-tracking data with accompanying predictability norms. The predictability norms for the Provo Corpus differ from those of other corpora. In addition to traditional cloze scores that estimate the predictability of the full orthographic form of each word, the Provo Corpus also includes measures of the predictability of the morpho-syntactic an...

متن کامل

Characterization of Hesitations Using Acoustic Models

Spontaneous speech is full of hesitations, such as fillers, word cut-offs, repetitions and segmental extensions. Automatic identification of such hesitations has several applications; however, it is a challenging research problem. In this paper acoustic-phonetic properties of hesitation phenomena are explored in order to identify and annotate some of these events in a spontaneous speech corpus ...

متن کامل

Attention and hesitations in speech 1 Running head: ATTENTION AND HESITATIONS IN SPEECH Attention orienting effects of hesitations in speech: Evidence from ERPs

Filled-pause disfluencies such as um and er affect listeners' comprehension, possibly mediated by attentional mechanisms (Fox Tree, 2001). However, there is little direct evidence that hesitations affect attention. The current study used an acoustic manipulation of continuous speech to induce attention-related ERP components (Mismatch Negativity [MMN] and P300) during the comprehension of fluen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996